Skip to content

[Cohere] Add cohere2_moe model support#1340

Open
Terrencezzj wants to merge 3 commits into
ml-explore:mainfrom
Terrencezzj:cohere2_moe
Open

[Cohere] Add cohere2_moe model support#1340
Terrencezzj wants to merge 3 commits into
ml-explore:mainfrom
Terrencezzj:cohere2_moe

Conversation

@Terrencezzj
Copy link
Copy Markdown

[Cohere] Add cohere2_moe model support

  • Adds cohere2_moe architecture support to mlx-lm.
  • The PR also adds compressed-tensors W4A16 loading support, so quantized Cohere2 MoE checkpoints can run on mlx
  • Added test_cohere2_moe model construction/generation-cache coverage in tests/test_models.py

Test plan

python -m pip install -e .
python -m mlx_lm generate \
  --model /path/to/cohere2_moe_nvfp4 \
  --prompt "Solve this coding problem: given strings s and t, return the minimum number of characters to append to s so t becomes a subsequence." \
  --max-tokens 32768 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 0

@nastya236 nastya236 added the enhancement New feature or request label Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants